Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation
نویسندگان
چکیده
We present a method that determines the optimal configuration of a bilinear vocal tract length normalization function to transform the frequency axis of one voice according to a specific target voice. Given a number of parallel utterances of the involved speakers, the single parameter of this function can be calculated through an iterative procedure by minimizing an objective error measure defined in the cepstral domain. This method is also applicable when multiple warping classes are considered, and it can be complemented with amplitude correction filters. The resulting physically motivated cepstral transformation results in highly satisfactory conversion accuracy and improved quality with respect to standard satistical systems.
منابع مشابه
Auditory Filterbank Improves Voice Morphing
This paper presents a new method for vocal tract length (VTL) estimation and normalization based on a gammachirp auditory filterbank (GCFB) to improve the sound quality in voice morphing. VTL ratios between 28 speakers were estimated based on the spectral distances for all permutations (756 = 28P27) . The VTL estimation using the mel-frequency filterbank (MFFB), which is a preprocessor for calc...
متن کاملRapid vocal tract length normalization using maximum likelihood estimation
Recently, vocal tract length normalization (VTLN) techniques have been developed for speaker normalization in speech recognition. This paper proposes a new VTLN method, in which the vocal tract length is normalized in the cepstrum space by means of linear mapping whose parameter is derived using maximumlikelihood estimation. The computational costs of this method are much lower than that of suc...
متن کاملEfficient pitch-based estimation of VTLN warp factors
To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML) estimation, involving an exhaustive search over possible values. We describe an alternative approach: exploit the correlation between a speaker’s a...
متن کاملEffects of Voice Therapy on Vocal Tract Discomfort in Muscle Tension Dysphonia
Introduction: Patients with muscle tension dysphonia (MTD) suffer from several physical discomforts in their vocal tract. However, few studies have examined the effects of voice therapy (VT) on the vocal tract discomfort (VTD) in patients with voice disorders. Therefore, the aim of the present study was to investigate the effects of VT on the VTD in patients with MTD. Materi...
متن کاملEfficient Pitch-based Estimation o
To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML) estimation, involving an exhaustive search over possible values. We describe an alternative approach: exploit the correlation between a speaker’s a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012